International Journal of Data Science and Big Data Analytics
|
Volume 2, Issue 1, May 2022 | |
Research PaperOpenAccess | |
Detecting Offensive Language in Multi-Dialectal Arabic Social Media |
|
Ahmed Fahmy1* |
|
1Computer Science and Engineering Department, The American University in Cairo, Cairo, Egypt. E-mail: awael@aucegypt.edu
*Corresponding Author | |
Int.J.Data.Sci. & Big Data Anal. 2(1) (2022) 20-25, DOI: https://doi.org/10.51483/IJDSBDA.2.1.2022.20-25 | |
Received: 13/02/2022|Accepted: 09/04/2022|Published: 05/05/2022 |
Recently, reliance on social media has been steadily increasing from year to year. And as an anonymous medium of communication, people tend to share offensive comments which could be problematic and potentially cause a lot of harm to society. In order to find ways of addressing this issue, researching an automated method that detects offensive text within social media platforms has become important. Research in this field within the Arabic language is not as widely available as in other languages. Due to recent breakthroughs in Arabic Natural Language Processing, we were able to achieve results which are more accurate in detecting offensive content within social media. The Arabic language is in itself a different challenge compared to English, being a morphologically rich language. With the recent breakthrough in transformer based models such as BERT, which have been able to achieve state-of-the-art results in various tasks and building upon the AraBERT pre-training which has been proven to outperform multilingual BERT, as well as utilizing Arabic specific methods of pre-processing, we were able to achieve better results than established approaches for this task. Specifically, the BERT- base model achieved an F1-score of 84.88% on a multiplatform, multi-dialect dataset.
Keywords: Arabic, Language, Multi-dialect, offensive, BERT
Full text | Download |
Copyright © SvedbergOpen. All rights reserved